Phrase Discovery for English and Cross-language Retrieval at TREC 6

نویسندگان

  • Fredric C. Gey
  • Aitao Chen
چکیده

Berkeley's experiments in TREC-6 center around phrase discovery in topics and documents. The technique of ranking bigram term pairs by their expected mutual information value was utilized for English phrase discovery as well as Chinese seg-mentation. This diierentiates our phrase-nding method from the mechanistic one of using all bigrams which appear at least 25 times in the collection. Phrase nd-ing presents an interesting interaction with stop words and stop word processing. English phrase discovery proved very important in a dictionary-based English to German cross language run. Our participation in the ltering track was marked with an interesting strictly Boolean retrieval as well as some experimentation with maximum utility thresholds on probabilistically ranked retrieval.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Manual Queries and Machine Translation in Cross-Language Retrieval and Interactive Retrieval with Cheshire II at TREC-7

For TREC-7, the Berkeley ad-hoc experiments explored more phrase discovery in topics and documents. We utilized Boolean retrieval combined with probabilistic ranking for 17 topics in ad-hoc manual entry. Our cross-language experiments tested 3 di erent widely available machine translation software packages. For language pairs (e.g. German to French) for which no direct machine translation was a...

متن کامل

Exploiting the LDC Chinese-English Bilingual Wordlist for Cross Language Information Retrieval

We investigated using the LDC English/Chinese bilingual wordlists for English-Chinese cross language retrieval. It is shown that the Chinese-to-English wordlist can be considered as both a phrase and word dictionary, and is preferable to the English-to-Chinese version in terms of phrase translation and word translation selection. Additional techniques such as frequency-based term selection, tra...

متن کامل

TREC-9 Cross-Language Information Retrieval (English-Chinese) Overview

(English Chinese) Overview Fredri Gey and Aitao Chen UC DATA and SIMS University of California, Berkeley e-mail: gey u data.berkeley.edu,aitao sims.berkeley.edu Abstra t Sixteen groups parti ipated in the TREC-9 ross-language information retrieval tra k whi h fo ussed on retrieving Chinese language do uments in response to 25 English queries. A variety of CLIR approa hes were tested and a ri h ...

متن کامل

TREC-9 CLIR Experiments at MSRCN

In TREC-9, we participated in the English-Chinese Cross-Language Information Retrieval (CLIR) track. Our work involved two aspects: finding good methods for Chinese IR, and finding effective translation means between English and Chinese. On Chinese monolingual retrieval, we investigated the use of different entities as indexes, pseudorelevance feedback, and length normalization, and examined th...

متن کامل

Chinese Document Retrieval at Trec-6 1 Multilingual Document Retrieval in Trec

The TREC-6 conference was the fourth year in which document retrieval in a language other than English was carried out. In TREC-3, 4 groups participated in an ad hoc retrieval task on a collection of 208 Mbytes of Mexican newspaper text in the Spanish language. In TREC-4 there were 10 groups who participated, once again in an ad hoc document retrieval task on the same Mexican newspaper texts bu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997